In this lab, we present a determination of the Hubble Constant to 4.5% error. We employ the cosmic distance ladder, in combination with known redshifts of type 1a supernova, to determine $H_0 = 70.35 (+3.23\ \text{-}3.09)$. Near-infrared photometry data from the Wide Field Camera 3 on the Hubble Space Telescope is used to derivate a magnitude relations for Cepheids, from period and metallicity. Standardized absolute magnitudes of type 1a supernova are then used, together with known redshifts, to make a far-reaching velocity distance trend. The distance trend is anchored using the NGC 4258 galaxy, the distance to which is known very precisely, due to a water maser orbiting its central black hole. The determined Hubble Constant falls within one standard deviation of the Riess et al. 2016. value determined using the same anchor.
Observations have shown that most galaxies in our universe move away from us, with velocities proportional to their distances. The proportionality factor between distance and velocity is called the Hubble constant, or $H$. It is believed to change with time, with the "current" Hubble constant known as $H_0$. In order to determine the Hubble constant, we, in principle, only need the distance and velocity of some distant galaxies. Nearby galaxies give poor precision, as random movements often dominate the expansion. While finding velocities of distant objects is actually rather achievable, due to the redshift in their EM-spectra, determining distances can be very challenging and will consume most of our attention. This feat can be accomplished by building a cosmic distance ladder.
The cosmic distance ladder is a method for measuring the distance of far away objects, which would be extremely difficult by direct measurement. It is built on the idea of finding different types of stellar objects, which have known distance/property relation for some chosen property. If some of these objects can be observed in the same galaxies, we can approximate their distances to be equal, and "chain" their distance/property relations. This allows us to combine our knowledge of different stellar objects with different advantages, like high visibility far from earth, or very accurately known distances. The ladder also requires the observation of one or more anchors, which are objects with very accurately known distances, which serves as zero-point references.
In theory, this process could be repeated any number of times, by chaining the known relations to any number of object types. We will limit ourselves to three types or objects. We will employ a combination of Cepheids and type Ia supernova, together with the NGC4258 maser as a distance anchor.
Cepheids are very bright, relatively rare, periodically pulsating stars. Like many other pulsating stars (i.e. RR Lyrae), Cepheids have predictable period/magnitude relations, making them good standard candles for distance measurements in the tens of Mpc range. Although not particularly common, they are abundant enough that a respectable amount of them can be found in galaxies also containing type 1a supernova.
Type 1a Supernova are some of the brightest objects in the universe, being a result of exploding white dwarfs. They can be observed at extremely large distances, making them great candidates for deriving a distance-velocity relation. They are also very rare.
One galaxy of special interest is NGC 4258, which happens to contain a maser, orbiting its central black hole. Masers are strange and very rare celestial objects, which emits a large amount of stimulated microwave radiation. This makes them great distance indicators, and the distance to NGC 4258 is therefore known with unparalleled accuracy and precision. It luckily also contains a respectable amount of Cepheids, with both similar and dissimilar metallicities from those in the milky way. It therefore provides a great opportunity to calibrate more distant Cepheids and will serve as the anchor for our distance ladder.
We will employ near-infrared observations of Cepheids from the Wide Field Camera 3 on the Hubble Space Telescope for our analysis of the Cepheids. The data has conveniently been cataloged and provided by the Riess team (Riess et al. 2016) to be downloaded in any desirable format at this site.
We have downloaded a fits file, containing all needed Cepheid data. Below we parse the file to extract inhabiting galaxy, magnitudes, and metallicity.
The set contains a total of 1486 stars in 21 galaxies. One of these galaxies is M31, more commonly known as the Andromeda Galaxy. We are going to have to exclude M31 from our dataset because it happens to be moving towards us. This is a problem since we are trying to build a predictive model as to how fast stuff is accelerating away from us. There is nothing incompatible between galaxies moving towards us and the expanding universe theory. M31 just happens to be so close to us that random velocities overcome the general expansion.
Removing M31 leaves us with 1114 stars in 20 galaxies. The distribution of Cepheids across galaxies is shown in the histogram below.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import corner
import seaborn as sns
# import mpld3
# mpld3.enable_notebook()
from tqdm import trange
from astropy.io import fits
import pystan
import arviz
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
plt.style.use('seaborn-darkgrid')
mpl.rcParams['figure.figsize'] = [14.0, 8.0]
mpl.rcParams['figure.dpi'] = 80
mpl.rcParams['savefig.dpi'] = 100
mpl.rcParams['font.size'] = 12
mpl.rcParams['legend.fontsize'] = 'Large'
mpl.rcParams['figure.titlesize'] = 'Large'
np.random.seed(1942)
with fits.open("asu.fit") as infile:
data = infile[1].data
nr_stars = len(data)
gal = np.empty(nr_stars, dtype=object) # Galaxy name.
period = np.zeros(nr_stars) # Cepheid periods in [Days].
VImag = np.zeros(nr_stars) # V-I band magnitude, where V=F555W and I=F814W
mag = np.zeros(nr_stars) # H=F160W band magnitude.
mag_err = np.zeros(nr_stars) # Standard magnitude error of F160W
Z = np.zeros(nr_stars) # Metallicity Z = 12 + Log(O/H)
for i in trange(nr_stars):
gal[i] = data[i][2]
period[i] = data[i][6]
VImag[i] = data[i][7]
mag[i] = data[i][8]
mag_err[i] = data[i][9]
Z[i] = data[i][10]
print(f"Initial number of stars is {nr_stars}.")
cut_idx = gal != 'M31'
gal = gal[cut_idx]
VImag = VImag[cut_idx]
period = period[cut_idx]
mag = mag[cut_idx]
mag_err = mag_err[cut_idx]
Z = Z[cut_idx]
logP = np.log10(period)
nr_stars = gal.shape[0]
print(f"Removing M31 reduced number of stars to {nr_stars}.")
gal_names = np.unique(gal)
gal_names = np.delete(gal_names, 12)
gal_names = np.append(gal_names, 'N4258')
nr_gals = len(gal_names)
print(f"We are left with {nr_gals} galaxies.")
gal_nrs = np.zeros(nr_stars, dtype=int)
for i, gal_name in enumerate(gal_names):
gal_nrs[gal == gal_name] = i
plt.figure(figsize=(16,8))
plt.hist(gal_nrs, bins=nr_gals, align='left', rwidth=0.8)
plt.xticks(np.linspace(0, nr_gals-1.9, nr_gals), gal_names)
plt.ylabel("Number of Stars")
plt.title("Numer of stars by galaxy")
plt.text
plt.tight_layout();
Before we can exploit the period/magnitude relations of the Cepheids, we must adjust the magnitude for extinction. Wesenheit magnitude is an extinction-corrected apparent magnitude, taking into account the dust extinction between us and the object. If we simply used the F160W magnitude, our Cepheids magnitudes would have covariance with extinction, which would severely impact the precision of our model, as it depends on being able to compare magnitude between Cepheids, which requires a standardized magnitude. The Wesenheit magnitude is known as
$$ m_W^H=m_H−R\cdot (V−I)\ , \quad\quad R = \frac{A_H}{A_V - A_I} $$Here, $m_H$ is the relative magnitude of the object in the H-band F160W:
$$ m_H = M_H + \mu_0,\quad\quad \mu_0 = 5\log(D) + 25 $$with $M_H$ as absolute magnitude, and $D$ as distance in Mpc.
Riess et al. 2016 finds $R=0.39$. Below we have calculated the Wesenheit magnitude of all Cepheids.
R = 0.39
magW = mag - R*VImag
Below, we have plotted the Wesenheit magnitude, log period relation, color mapped after metallicity. There is an obvious negative trend in more or less all galaxies. From these plots, there doesn't seem to be any noticeable metallicity trend, although they might be revealed thought other visualization methods.
Z_min, Z_max = np.min(Z), np.max(Z)
fig, ax = plt.subplots(5, 4, figsize=(18,16))
for i in range(nr_gals):
cut_idx = gal == gal_names[i]
current_magW = magW[cut_idx]
current_log_period = logP[cut_idx]
current_Z = Z[cut_idx]
j, k = i//4, i%4
sc = ax[j,k].scatter(current_log_period, current_magW, s=10, c=current_Z, cmap="gnuplot")
ax[j,k].set_title(f"Galaxy {gal_names[i]}");
ax[j,k].set_ylabel("$m_W^H$");
ax[j,k].set_xlabel("Log(Period) [Days]");
plt.colorbar(sc, ax=ax[j,k])
fig.tight_layout();
The Cepheids in NGC 4258 are of special importance because the distance to their galaxy is known very precisely. Orbiting the central black hole of NGC 4258 is a water maser - the microwave equivalent of a laser, emitting stimulated EM radiation. at 22GHz. The geometric distance to the maser, and therefore the galaxy, can very precisely be determined by direct acceleration measurements and fitting its Keplerian motion around the black hole. Humphreys et al. 2013 explains this process in detail and derives a 3% certainty of the geometric distance. The best distance estimate to NGC 4258 we could find was however presented in Riess et al. 2016, which claims to have reduced the error to 2.6% by running the MCMC model proposed by Humphreys et al. 2013 for longer. They present the distance modulus of $\mu_\mathrm{0,N4258} = 29.387 \pm 0.0568$, corresponding to a distance of $7.54\, \mathrm{Mpc}$.
Masers are very rare, and NGC 4258 is the only galaxy in our model on which we can employ this approach. It will therefore serve as a distance anchor for the rest of our model.
The Cepheids are expected to have predictable relations between their period and standardized (Wesenheit) magnitude. We will also throw in a magnitude dependence on metallicity, to improve our model. The dependence on both these parameters are assumed to be power laws, meaning we will get a nice, linear fit if we use the logarithms of the parameters in our model. The model will then consist of the slope coefficients of both parameters, as well as an intercept. The model then becomes
$$ m_{H}^{W}=\mathrm{zp}_\mathrm{W,N4258} + b_{W} \Delta\log P+Z_{W} \Delta \log (\mathrm{O} / \mathrm{H}) $$where $\mathrm{zp}_\mathrm{W,N4258}$, $b_{W}$, and $Z_{W}$ are fitting parameters. The two latter parameters will be re-fit with all galaxies, but the intercept will carry on as an anchor for further calculations.
We have also scaled the metallicity relative to that of our own sun, at $\log(\mathrm{O} / \mathrm{H})_{\mathrm{sun}} = 8.66$, such that
$$ \Delta \log (\mathrm{O} / \mathrm{H}) = \log (\mathrm{O} / \mathrm{H}) - \log(\mathrm{O} / \mathrm{H})_{\mathrm{sun}} $$The log period is also scaled by the mean of all log period, such that
$$ \Delta \log P = \log P - \overline{\log P} $$Scaling parameters to values around their mean should have no effect on our final best fit, but helps reduce covariance between parameters.
STAN is a statistical programming language, used for Monte Carlo simulations of hierarchical models. It has a Python module called PyStan, which compiles to C++, and is generally more efficient than most MCMC options. It comes equipped with a powerful class of Hamiltonian Monte Carlo (HMC) samplers, which often holds strong advantages to more traditional sampling models. For complex, many-dimensional probability surfaces, where traditional sampling methods struggle with convergence, HMC shines. It comes at the cost of more calculations per cycle, and involves calculating the gradient of the probability surface, the difficulty of which depends on the problem, and ranges from trivial, to hugely expensive. However, when dealing with the number of dimensions we shall (>20), it will perform admirably.
HMC is built on the concept of considering the probability surface to be a physical potential and exploring the surface in the way a frictionless object would roll around it. It calculates the Hamiltonian of the system (from where it gets its name), and solves a series of path integrals, using leap-frog integration, in order to explore the probability surface. Here, we will employ a popular HMC variant called the No-U-Turn-Sampler. It is a special adaption of HMC which attempts to explore the surface as efficiently as possible, by attempting to avoid exploring the same path over and over ("no u-turns"), and instead starts the sampling from a different initial condition if it finds the sampler to be re-exploring the same path.
logP = logP - np.mean(logP)
Z_ref = 8.66
dOH = Z - Z_ref
N4258_cut_idx = gal == "N4258"
dOH_N4258 = dOH[N4258_cut_idx]
magW_N4258 = magW[N4258_cut_idx]
logP_N4258 = logP[N4258_cut_idx]
mag_err_N4258 = mag_err[N4258_cut_idx]
We will be fitting the magnitude-relation equation with maximum likelihood estimation in Stan, assuming Gaussian errors on the magnitude. We will also be including an intrinsic scatter, which will give us more information about how extensive our model is.
Below, the Stan code is shown and compiled. The Stan code is divided into 4 sections - data, parameters, transformed parameters, and model. The data section contains declarations of all data to be provided, while the parameters section does the same for all fitting parameters Stan should create. The transformed parameters sections allow calculations of new parameters to be used in the model, where we calculate the right-hand side of the above fitting equation, as well as the left-hand side standard error, including scatter. The model section contains the posterior, where we have put the priors on the fitting parameters, as well as the likelihood of the magnitude function.
We have used uninformed priors for all four parameters, set at reasonable ranges after some initial testing.
The model is run with 20000 samples for each of 8 parallel chains. The first half of them are discarded, as they any MC sampler is initially influenced by its initial conditions (which has been set random within the uniform priors).
# set prior characteristics
stan_dict = {}
stan_dict['zp_lower'], stan_dict['zp_upper'] = 23.0-5, 30.0-5
stan_dict['bW_lower'], stan_dict['bW_upper'] = -5.0, 0.0
stan_dict['ZW_lower'], stan_dict['ZW_upper'] = -4.0, 4.0
stan_dict['s_lower'], stan_dict['s_upper'] = 0.0, 2.0
stan_code = """
data {{
// define the data within Stan
int<lower=0> N; // number of data points
real magW[N]; // x values
real logP[N]; // y values
real dOH[N];
real<lower=0> sigma_magW[N]; // y error bars
}}
parameters {{
// model parameters
real<lower={zp_lower}, upper={zp_upper}> zp;
real<lower={bW_lower}, upper={bW_upper}> bW;
real<lower={ZW_lower}, upper={ZW_upper}> ZW;
real<lower={s_lower}, upper={s_upper}> s;
}}
transformed parameters{{
// construct the model (i.e., theta=mx+b)
real theta[N];
real tot_sigma[N];
for (j in 1:N)
theta[j] = zp + bW * logP[j] + ZW * dOH[j];
for (j in 1:N)
tot_sigma[j] = sqrt(square(sigma_magW[j]) + square(s));
}}
model {{
zp ~ uniform({zp_lower}, {zp_upper}); //a slope has a normal distribution
bW ~ uniform({bW_lower}, {bW_upper}); // y-intercept has a uniform distribution
ZW ~ uniform({ZW_lower}, {ZW_upper});
s ~ uniform({s_lower}, {s_upper});
magW ~ normal(theta, tot_sigma); // data points have a normal distribution -> Gaussian likelihood function
}}
"""
# create the model
sm = pystan.StanModel(model_code=stan_code.format(**stan_dict))
# set the data
linear_data = {'N': len(logP_N4258),
'magW': magW_N4258,
'logP': logP_N4258,
'dOH' : dOH_N4258,
'sigma_magW': mag_err_N4258}
Nsamples = 20000 # number of samples
warmup = 10000 # number of burn in samples (stan default is half of Nsamples)
chains = 8 # number of chains
thin = 1 # discard every n=thin samples in the end
seed = 128 # random seed for starting the chain
# run the model
fit = sm.sampling(data=linear_data, iter=Nsamples, chains=chains, warmup=warmup, thin=thin, seed=seed, init='random')#, control={'adapt_delta' : 2})
samples = fit.extract(permuted=True) # Extracting Sample Data
stackedsamples = np.vstack((samples['zp'], samples['bW'], samples['ZW'], samples['s'])).T # pull out parameters of interest and reshape them
Below, a printout of the mean values, together with error deviations in each direction, and signal to noise ratio (SNR) is provided. The intercept has very high SNR, which is good, as it is the only of the parameters we actually care about. The period slope has a pretty standard SNR, while the metallicity slope lies very low. This is expected, as the period dependence is very evident, while the metallicity is much less so. As a matter of fact, we can't even tell if the slope is positive or negative within one sigma.
From the corner plots shown below, we see nice gaussian-like behavior on all parameter samples. There is almost no covariance, with the exception of the intercept and the metallicity slope.
Yet below, we see a scatter plot of the samples with error bars, together with the best-fit line in red. We have also overplotted in light grey a random set of 50 samples from the population. A random value of the opposing parameter was chosen, to reflect the actual scatter (i.e., when plotting a chosen set of parameters against log period, a random metallicity from within the same galaxy is chosen). As we can see, there are very clear trends in the period and no distinguishable trend in metallicity.
zp, bW, ZW, s = np.percentile(stackedsamples, [16, 50, 84], axis=0).T
zp_N4258, zp_N4258_sigma = zp[1], (zp[2]-zp[0])/2
print(f"zp = {zp[1]:8.4f} (+{zp[2]-zp[1]:6.4f}, -{zp[1]-zp[0]:6.4f}) -- SNR = {zp[1]*2/(zp[2]-zp[0]):.2f}")
print(f"bW = {bW[1]:8.4f} (+{bW[2]-bW[1]:6.4f}, -{bW[1]-bW[0]:6.4f}) -- SNR = {abs(bW[1])*2/(bW[2]-bW[0]):.2f}")
print(f"ZW = {ZW[1]:8.4f} (+{ZW[2]-ZW[1]:6.4f}, -{ZW[1]-ZW[0]:6.4f}) -- SNR = {ZW[1]*2/(ZW[2]-ZW[0]):.2f}")
print(f" s = {s[1]:8.4f} (+{s[2]-s[1]:6.4f}, -{s[1]-s[0]:6.4f}) -- SNR = {s[1]*2/(s[2]-s[0]):.2f}")
stackedsamples = np.vstack((samples['zp'], samples['bW'], samples['ZW'], samples['s'])).T
fig = corner.corner(stackedsamples, labels=[r"$zp$", r"$bW$", r"$ZW$", r"$s$"], range=[(21.5,22.5), (-3.7,-2.7), (-1.4, 1.4), (0, 0.3)], bins=40)
fig, ax = plt.subplots(1, 2, figsize=(18,5))
for i in np.random.randint(len(samples), size=50):
ax[0].plot(logP, samples['zp'][i] + samples['bW'][i]*logP + samples['ZW'][i]*np.random.choice(dOH_N4258), c="k", alpha=0.1)
ax[0].errorbar(logP_N4258, magW_N4258, yerr=mag_err_N4258, fmt=",")
sc = ax[0].scatter(logP_N4258, magW_N4258, s=50, c="navy")
ax[0].plot(logP, zp[1] + bW[1]*logP + ZW[1]*np.mean(dOH), c="r")
ax[0].set_xlabel("$\Delta$ Log(Period) [Days]");
ax[0].set_ylabel("$m_W^H");
for i in np.random.randint(len(samples), size=50):
ax[1].plot(dOH, samples['zp'][i] + samples['bW'][i]*np.random.choice(logP_N4258) + samples['ZW'][i]*dOH, c="k", alpha=0.1)
ax[1].errorbar(dOH_N4258, magW_N4258, yerr=mag_err_N4258, fmt=",")
sc = ax[1].scatter(dOH_N4258, magW_N4258, s=50, c="navy")
ax[1].plot(dOH, (zp[1]) + bW[1]*np.mean(logP) + ZW[1]*dOH, c="r");
ax[1].set_xlabel("$\Delta$ Log(O/H)");
ax[1].set_ylabel("$m_W^H");
ax[1].set_xlim([0, 0.4]);
Consider the magnitude model employed in the previous section, and imagine it employed to Cepheids in a different galaxy, with some distance modulus $\mu_{0,i}$. We wish to take advantage of our very accurate knowledge of the distance to our anchor, NGC 4258, as well as our estimate that all Cepheids follows the same magnitude model. We therefore employ the equation used in the previous section, this time with the intercept $zp_{W,N4258}$ fixed at the derived value. Since this intercept is fitted for NGC 4258 specifically, we introduce a distance modulus difference, $\Delta \mu_{0,i} = \mu_{0,i} - \mu_{0,N4258}$, which accounts for the magnitude difference resulting from the varying distances to the galaxies. It can be thought of as moving all the Cepheids into NGC 4258, to make them follow the same trend. The reason this works is that the distance modulus is defined to be exactly the magnitude difference induced by changing the distance to an object: $\Delta m = \Delta \mu$.
$$ m_{H,i,j}^W = \underbrace{(\mu_{0,i} - \mu_{0,N4258})}_{\Delta \mu_{0,i}} + zp_{W,N4258} + b_W \log{P_{i,j}} + Z_W\Delta\log{(O/H)_{i,j}} $$Technically, the intercept will not be fixed but will be subject to a Gaussian likelihood function with the mean and standard error found in the last section. This will not change best-fit results, but allow for proper error propagation from the intercept into the other derived values.
The code implementation is similar to before, employing uniform priors on all parameters. One large difference is the addition of 19 new parameters, namely the distance modulus differences to NGC 4258 for all galaxies. These will all need to be fitted separately. As can be noted above, there are now also two indexes to account for - both the number of stars and the number of galaxies. The slopes are shared between all stars, while the distance modulus is only shared between stars in the same galaxy, making for a somewhat odd set of equations.
We still run with Stan's No-U-Turn sampler, employing 8 chains, 20000 iterations, and warmup for 10000, as before. Intrinsic scatter is also included.
mu0N4258, mu0N4258_sigma = 29.387, 0.0568
cut_idx_galaxies = gal_names != 'N4258'
cut_idx_stars = gal_nrs != 19
magW2 = magW[cut_idx_stars]
mag_err2 = mag_err[cut_idx_stars]
logP2 = logP[cut_idx_stars]
dOH2 = dOH[cut_idx_stars]
gal_nrs2 = gal_nrs[cut_idx_stars]
nr_stars2 = magW2.shape[0]
nr_gals2 = nr_gals-1
# set prior characteristics
stan_dict2 = {}
stan_dict2['dmu_lower'], stan_dict2['dmu_upper'] = -5, 30.0
stan_dict2['bW_lower'], stan_dict2['bW_upper'] = -5.0, 0.0
stan_dict2['ZW_lower'], stan_dict2['ZW_upper'] = -5.0, 5.0
stan_dict2['s_lower'], stan_dict2['s_upper'] = 0.0, 2.0
stan_dict2['zp_N4258_mean'], stan_dict2['zp_N4258_sigma'] = zp_N4258, zp_N4258_sigma
# create the model
stan_code2 = """
data {{
// define the data within Stan
int<lower=0> N; // number of data points
int<lower=0> n; // number of galaxies
int gal_nrs[N];
real magW[N];
real logP[N];
real dOH[N];
real<lower=0> sigma_magW[N];
real zp_N4258_mean;
real zp_N4258_sigma;
}}
parameters {{
// model parameters
real<lower={dmu_lower}, upper={dmu_upper}> dmu[n];
real<lower={bW_lower}, upper={bW_upper}> bW;
real<lower={ZW_lower}, upper={ZW_upper}> ZW;
real<lower={s_lower}, upper={s_upper}> s;
real<lower=zp_N4258_mean-2*zp_N4258_sigma, upper=zp_N4258_mean+2*zp_N4258_sigma> zp_N4258;
}}
transformed parameters{{
// construct the model
real theta[N];
real tot_sigma[N];
for (j in 1:N)
theta[j] = dmu[gal_nrs[j]] + zp_N4258 + bW * logP[j] + ZW * dOH[j];
for (j in 1:N)
tot_sigma[j] = sqrt(square(sigma_magW[j]) + square(s));
}}
model {{
for (j in 1:n)
dmu[j] ~ uniform({dmu_lower}, {dmu_upper});
bW ~ uniform({bW_lower}, {bW_upper});
ZW ~ uniform({ZW_lower}, {ZW_upper});
s ~ uniform({s_lower}, {s_upper});
magW ~ normal(theta, tot_sigma);
zp_N4258 ~ normal(zp_N4258_mean, zp_N4258_sigma);
}}
"""
sm2 = pystan.StanModel(model_code=stan_code2.format(**stan_dict2))
Nsamples = 20000 # number of samples
warmup = 10000 # number of burn in samples (stan default is half of Nsamples)
chains = 8 # number of chains
thin = 1 # discard every n=thin samples in the end
seed = 128 # random seed for starting the chain
# set the data
linear_data2 = {'N': nr_stars2,
'n' : nr_gals2,
'gal_nrs' : gal_nrs2+1, # Stan indexes from 1.
'magW': magW2,
'logP': logP2,
'dOH' : dOH2,
'sigma_magW': mag_err2,
'zp_N4258_mean' : zp_N4258,
'zp_N4258_sigma' : zp_N4258_sigma}
# run the model
fit2 = sm2.sampling(data=linear_data2, iter=Nsamples, chains=chains, warmup=warmup, thin=thin, seed=seed)#, control={'adapt_delta' : 0.99})
samples2 = fit2.extract()
pystan.check_hmc_diagnostics(fit2)
Below we see that the slopes have massively increased in SNR, with a slight bump to the intercept as well (which was allowed to move slightly around its mean from the last run). The metallicity trend is now actually established to be negative to within one sigma (and probably more).
The massive corner plot below shows nice Gaussian-looking samples on all parameters and low covariance on almost all values. Since the histograms are virtually unreadable, these are now also provided separately, in a figure below. Most parameters have very nice Gaussian-looking shapes, with the interesting exception of the zp-intercept, which even had a Gaussian likelihood.
dmu2 = np.percentile(samples2['dmu'], [16,50,84], axis=0)
dmu2_sigma = (dmu2[2]-dmu2[0])/2
zp2 = np.percentile(samples2['zp_N4258'], [16,50,84])
bW2 = np.percentile(samples2['bW'], [16,50,84])
ZW2 = np.percentile(samples2['ZW'], [16,50,84])
s2 = np.percentile(samples2['s'], [16,50,84])
print(f"zp = {zp2[1]:8.4f} (+{zp2[2]-zp2[1]:6.4f}, -{zp2[1]-zp2[0]:6.4f}) -- SNR={zp2[1]*2/(zp2[2]-zp2[0]):.2f}")
print(f"bW = {bW2[1]:8.4f} (+{bW2[2]-bW2[1]:6.4f}, -{bW2[1]-bW2[0]:6.4f}) -- SNR={abs(bW2[1])*2/(bW2[2]-bW2[0]):.2f}")
print(f"ZW = {ZW2[1]:8.4f} (+{ZW2[2]-ZW2[1]:6.4f}, -{ZW2[1]-ZW2[0]:6.4f}) -- SNR={abs(ZW2[1])*2/(ZW2[2]-ZW2[0]):.2f}")
print(f" s = {s2[1]:8.4f} (+{s2[2]-s2[1]:6.4f}, -{s2[1]-s2[0]:6.4f}) -- SNR={s2[1]*2/(s2[2]-s2[0]):.2f}")
samples2 = fit2.extract(permuted=True) # fit contains all information on chains
stackedsamples2 = np.vstack((*samples2['dmu'].T, samples2['zp_N4258'], samples2['bW'], samples2['ZW'], samples2['s'])).T # pull out parameters of interest and reshape them
fig = corner.corner(stackedsamples2, labels=[*[r"$dmu$ "+str(gal) for gal in gal_names], r"$bW$", r"$ZW$", r"$s$"]) # let's make a corner plot
samples_data2 = fit2.extract()
fig, ax = plt.subplots(1, 4, figsize=(18,2))
names_list = ['zp_N4258', 'bW', 'ZW', 's']
for i, name in enumerate(names_list):
sns.distplot(samples_data2[name], hist=False, ax=ax[i])
ax[i].set_title(name)
fig, ax = plt.subplots(4, 5, figsize=(18,8))
for i in range(19):
sns.distplot(samples_data2['dmu'][:,i], hist=False, ax=ax[i//5, i%5])
ax[i//5, i%5].set_title('$\Delta \mu$ ' + gal_names[i])
ax[-1,-1].axis('off');
plt.tight_layout()
Overplotting the new samples from only NGC 4258 on the corner plot from the previous run, we see that the uncertainty in all parameters (except the slope, zp) is vastly reduced. The new values also lies comfortably within the previous estimates, which is a good sign. It is also very interesting to note how the covariance between metallicity slope and intercept is entirely gone.
fig = corner.corner(stackedsamples2[:,19:], color="crimson", labels=['zp', 'bW', 'ZW', 's'], range=[(21.5,22.5), (-3.7,-2.7), (-1.4, 1.4), (0, 0.3)])
corner.corner(stackedsamples, fig=fig, labels=['zp', 'bW', 'ZW', 's'], range=[(21.5,22.5), (-3.7,-2.7), (-1.4, 1.4), (0, 0.3)]);
Below we have both printed and plotted a comparison of our derived distance modulus for all galaxies. We see that all but one outlier lies within one standard deviation on our observations, while interestingly, more values lie outside the R16 standard deviations. They have probably reduced their errors more than we were capable of. A bias is also revealed, as we always predict higher values than R16.
R16_table5_muCephs = np.array([
29.135,
32.497,
32.523,
31.307,
31.311,
31.511,
32.498,
32.072,
31.908,
31.587,
31.737,
31.290,
31.080,
30.906,
31.532,
31.786,
32.263,
31.499,
32.919])
R16_table5_muCephs_err = np.array([
0.045,
0.081,
0.055,
0.057,
0.045,
0.053,
0.090,
0.049,
0.043,
0.070,
0.069,
0.112,
0.292,
0.053,
0.071,
0.046,
0.102,
0.078,
0.063
])
print("Observed vs Riess et al. 2016 values for distance modulus of galaxies.")
print(f"Galaxy Name | Observed | R16")
print("-----------------------")
for i in range(nr_gals2):
print(f"{gal_names[i]:11s} | {dmu2[1,i]:8.3f} | {R16_table5_muCephs[i]-29.387:6.3}")
plt.figure(figsize=(18,10))
plt.errorbar(R16_table5_muCephs, dmu2[1]+mu0N4258, xerr=R16_table5_muCephs_err, fmt=".", c="crimson", markersize=10)
plt.errorbar(R16_table5_muCephs, dmu2[1]+mu0N4258, yerr=np.sqrt(dmu2_sigma**2 + mu0N4258_sigma**2), fmt=".", c="navy")
plt.plot([29, 33.2], [29, 33.2], c="y", ls="--")
plt.xlabel("Riess et al. 16 $\mu_0$"); plt.ylabel("Observed $\mu_0$");
A fundamental assumption in our approach is that absolute type 1a supernova magnitudes are constant. This is approximately, but not exactly, true. If it was, applying the distance modulus difference to any apparent type 1a supernova magnitude would yield a constant value: $M_x^0 = m_{x,i}^0 - \mu_{0,i}$, where $M_x^0$ contains no index $i$. To make this relationship work, we make a minor correction, or standardization, to the apparent magnitudes, which makes this relation true. This correction has already been carried out in Riess et al. 2016, with the corrected apparent magnitude values given in table 5. Riess et al. performed the SALT-II standardization, introduced in Guy et al. 2005 and Guy et al. 2010. The standardization is built on knowledge of type 1a peak and decay-length relations to intrinsic luminosity, and is performed by parameterizing type 1a lightcurves with a minimal set of fitting parameters, which is then trained on accurate, well-studied data.
The Hubble constant will, in the end, be measured as $$ \log{H_0} = \frac{M_B^0 + 5a_B + 25}{5} $$
where $a_B$ is the intercept of the type 1a magnitude-redshift relation. We will not derive, nor explain the derivation of it, here. Its values are contained together with standardized apparent magnitude in table 5 of Riess et al. 2016, as $m_{B,i}^0 + 5a_B$.
We can express the apparent magnitude of any object as some other plus the objects' distance modulus differences. We thus have
$$ m_{x,i}^0 = \Delta\mu_{0,i} + m_{x, \mathrm{N4258}}^0 $$where $m_{x,i}^0$ is the corrected apparent supernova magnitudes, and $m_{x, \mathrm{N4258}}^0$ would be the apparent magnitude of a supernova in NGC 4258, if one should exist. We can further exploit that this apparent magnitude can be rewritten to an absolute as $m_{x, \mathrm{N4258}}^0 = M_x^0 - \mu_\mathrm{N4258}$, which leaves us with
$$ m_{x,i}^0 = \Delta\mu_{0,i} + M_x^0 + \mu_{0, \mathrm{N4258}} $$where we have removed the galaxy indicator on the absolute supernova magnitude, as these are assumed to be constant. This leaves us with a single new fitting parameter, $M_x^0$, being the absolute magnitude of any type 1a supernova. We will as well be fitting for $\Delta\mu_{0,i}$, which gives us a mean of coupling this equation with the Cepheid one described in the previous section, as the distance modulus of each galaxy is constrained to be the same.
Riess et al. 2016 provide the apparent magnitudes together with the redshift intercept $a_B$. The same table also contains the standard error of this expression. To avoid trying to de-couple the errors into the two contributing terms, we will simply add $a_B$ to our fitting equation, such that
$$ m_{x,i}^0 + 5a_B = \Delta\mu_{0,i} + M_x^{0*} + \mu_{0, \mathrm{N4258}},\quad\quad M_x^{0*} = M_x^0 + 5a_B $$where $M_x^{0*}$ will be our new fitting parameter.
We will also be using the distance modulus of NGC 4258, presented in Riess et al. 2016. It is
$$ \mu_\mathrm{0,N4258} = 29.387\ (\pm 0.0568) $$ab = 0.71273
mag_IA_plus_ab = np.array([13.310,\
17.015,\
16.756,\
15.482,\
15.765,\
15.840,\
16.527,\
16.476,\
16.265,\
16.048,\
15.795,\
15.797,\
15.110,\
15.177,\
15.983,\
16.265,\
16.572,\
15.867,\
17.034])
magIA_ab_err = np.array([0.117,\
0.123,\
0.116,\
0.125,\
0.116,\
0.142,\
0.117,\
0.115,\
0.124,\
0.116,\
0.115,\
0.114,\
0.109,\
0.124,\
0.115,\
0.115,\
0.115,\
0.115,\
0.114])
# set prior characteristics
stan_dict3 = {}
stan_dict3['dmu_lower'], stan_dict3['dmu_upper'] = -2.0, 6.0
stan_dict3['m0xN4258_lower'], stan_dict3['m0xN4258_upper'] = -30.0, 10.0
stan_dict3['bW_lower'], stan_dict3['bW_upper'] = -5.0, 0.0
stan_dict3['ZW_lower'], stan_dict3['ZW_upper'] = -3.0, 3.0
stan_dict3['s_lower'], stan_dict3['s_upper'] = 0.0, 2.0
# create the model
stan_code3 = """
data {{
// define the data within Stan
int<lower=0> N; // number of data points
int<lower=0> n; // number of galaxies
int gal_nrs[N];
real magW[N];
real magIA[n];
real logP[N];
real dOH[N];
real<lower=0> sigma_magW[N];
real<lower=0> sigma_magIA[n];
real zp_N4258_mean;
real zp_N4258_sigma;
real mu0N4258_mean;
real mu0N4258_sigma;
}}
parameters {{
// model parameters
real<lower=mu0N4258_mean-2*mu0N4258_sigma, upper=mu0N4258_mean+2*mu0N4258_sigma> mu0N4258;
real<lower={dmu_lower}, upper={dmu_upper}> dmu[n];
real<lower={m0xN4258_lower}, upper={m0xN4258_upper}> m0xN4258;
real<lower={bW_lower}, upper={bW_upper}> bW;
real<lower={ZW_lower}, upper={ZW_upper}> ZW;
real<lower={s_lower}, upper={s_upper}> s1;
real<lower={s_lower}, upper={s_upper}> s2;
real<lower=zp_N4258_mean-2*zp_N4258_sigma, upper=zp_N4258_mean+2*zp_N4258_sigma> zp_N4258;
}}
transformed parameters{{
// construct the model
real theta1[N];
real theta2[n];
real tot_sigma1[N];
real tot_sigma2[n];
for (j in 1:N)
theta1[j] = dmu[gal_nrs[j]] + zp_N4258 + bW * logP[j] + ZW * dOH[j];
for (i in 1:n)
theta2[i] = dmu[i] + mu0N4258 + m0xN4258;
for (j in 1:N)
tot_sigma1[j] = sqrt(square(sigma_magW[j]) + square(s1));
for (i in 1:n)
tot_sigma2[i] = sqrt(square(sigma_magIA[i]) + square(s2));
}}
model {{
for (j in 1:n)
dmu[j] ~ uniform({dmu_lower}, {dmu_upper});
m0xN4258 ~ uniform({m0xN4258_lower}, {m0xN4258_upper});
bW ~ uniform({bW_lower}, {bW_upper});
ZW ~ uniform({ZW_lower}, {ZW_upper});
s1 ~ uniform({s_lower}, {s_upper});
s2 ~ uniform({s_lower}, {s_upper});
magW ~ normal(theta1, tot_sigma1);
magIA ~ normal(theta2, tot_sigma2);
zp_N4258 ~ normal(zp_N4258_mean, zp_N4258_sigma);
mu0N4258 ~ normal(mu0N4258_mean, mu0N4258_sigma);
}}
"""
sm3 = pystan.StanModel(model_code=stan_code3.format(**stan_dict3))
Nsamples = 20000 # number of samples
warmup = 10000 # number of burn in samples (stan default is half of Nsamples)
chains = 8 # number of chains
thin = 1 # discard every n=thin samples in the end
seed = 128 # random seed for starting the chain
# set the data
linear_data3 = {'N': nr_stars2,
'n' : nr_gals2,
'gal_nrs' : gal_nrs2+1, # Stan indexes from 1.
'logP': logP2,
'dOH' : dOH2,
'magW': magW2,
'sigma_magW' : mag_err2,
'magIA' : mag_IA_plus_ab,
'sigma_magIA' : magIA_ab_err,
'zp_N4258' : zp_N4258,
'zp_N4258_mean' : zp_N4258,
'zp_N4258_sigma' : zp_N4258_sigma,
'mu0N4258_mean' : mu0N4258,
'mu0N4258_sigma' : mu0N4258_sigma}
# run the model
fit3 = sm3.sampling(data=linear_data3, iter=Nsamples, chains=chains, warmup=warmup, thin=thin, init='random', seed=seed)#, control={'adapt_delta' : 0.99})
samples3 = fit3.extract()
Below we show the results of the run, with corner plots and more readable histograms. The fitting parameter of main interest, $M_B^0 + a_B$, arrived with very high accuracy. We can also sanity check this result, by subtracting $5a_B \approx 0.71$, leaving $M_B^0 \approx -19.3$, which is around type 1a absolute magnitude known from the literature.
dmu3 = np.percentile(samples3['dmu'], [16,50,84], axis=0)
dmu3_sigma = (dmu3[2]-dmu3[0])/2
zp3 = np.percentile(samples3['zp_N4258'], [16,50,84])
M0xN4258 = np.percentile(samples3['m0xN4258'], [16,50,84])
bW3 = np.percentile(samples3['bW'], [16,50,84])
ZW3 = np.percentile(samples3['ZW'], [16,50,84])
s13 = np.percentile(samples3['s1'], [16,50,84])
s23 = np.percentile(samples3['s2'], [16,50,84])
print(f"M_B^0+5aB = {M0xN4258[1]:8.4f} (+{M0xN4258[2]-M0xN4258[1]:6.4f}, -{M0xN4258[1]-M0xN4258[0]:6.4f}) -- SNR={abs(M0xN4258[1])*2/(M0xN4258[2]-M0xN4258[0]):.2f}")
print(f" zp = {zp3[1]:8.4f} (+{zp3[2]-zp[1]:6.4f}, -{zp3[1]-zp3[0]:6.4f}) -- SNR={zp3[1]*2/(zp3[2]-zp3[0]):.2f}")
print(f" bW = {bW3[1]:8.4f} (+{bW3[2]-bW3[1]:6.4f}, -{bW3[1]-bW3[0]:6.4f}) -- SNR={abs(bW3[1])*2/(bW3[2]-bW3[0]):.2f}")
print(f" ZW = {ZW3[1]:8.4f} (+{ZW3[2]-ZW3[1]:6.4f}, -{ZW3[1]-ZW3[0]:6.4f}) -- SNR={abs(ZW3[1])*2/(ZW3[2]-ZW3[0]):.2f}")
print(f" s1 = {s13[1]:8.4f} (+{s13[2]-s13[1]:6.4f}, -{s13[1]-s13[0]:6.4f}) -- SNR={s13[1]*2/(s13[2]-s13[0]):.2f}")
print(f" s2 = {s23[1]:8.4f} (+{s23[2]-s23[1]:6.4f}, -{s23[1]-s23[0]:6.4f}) -- SNR={s23[1]*2/(s23[2]-s23[0]):.2f}")
stackedsamples3 = np.vstack((*samples3['dmu'].T, samples3['zp_N4258'], samples3['bW'], samples3['ZW'], samples3['s1'], samples3['s2'])).T
fig = corner.corner(stackedsamples3, labels=[*[r"$dmu$ "+str(gal) for gal in gal_names], r"zp_N4258", r"$bW$", r"$ZW$", r"$s$"])
fig, ax = plt.subplots(2, 3, figsize=(18,4))
names_list = ['m0xN4258', 'mu0N4258', 'bW', 'ZW', 's1', 's2']
for i, name in enumerate(names_list):
sns.distplot(samples3[name], hist=False, ax=ax[i//3, i%3])
ax[i//3, i%3].set_title(name)
plt.tight_layout();
fig, ax = plt.subplots(4, 5, figsize=(18,8))
for i in range(19):
sns.distplot(samples3['dmu'][:,i], hist=False, ax=ax[i//5, i%5])
ax[i//5, i%5].set_title('dmu ' + gal_names[i])
ax[-1,-1].axis('off');
plt.tight_layout();
Finally, we are ready to insert the derived "absolute magnitude plus redshift intercept", $M_B^0 + 5a_B$ into our equation for the Hubble Constant:
$$ H_0 = 10^{\frac{M_B^{0*} + 5}{25}} $$Below we have below plotted the distribution result for $H_0$. Our best fit value for $H_0$ is 70.35, which is slightly lower than the 72.25 found in Riess et al. 2016, but within a sigma difference. Our error is also slightly higher than the one from R16, ours being at (+3.23-3.09), while theirs reported at (+-2.51). These results are well within reasonable errors, and there seems to be no very large biases. We noted earlier that our model seemed to overestimate the distance modulus to all galaxies, which could probably contribute to the difference we are seeing.
ab_dist = np.random.normal(0.71273, 0.00176, len(samples3['m0xN4258']))
H0_dist = 10**((samples3['m0xN4258'] + 25)/5)
plt.hist(H0_dist, bins=40, label="Derived H0");
x = np.linspace(60, 85, 1001)
from scipy.stats import norm
plt.plot(x, 6200*np.exp( -(x - 72.25)**2/(2*2.51**2)), c="r", label="R16 H0");
plt.legend();
plt.xlabel("H0")
plt.ylabel("Probability Density (Unscaled)")
H0 = np.percentile(H0_dist, [16,50,84])
print(f"H0 = {H0[1]:.2f} (+{H0[2]-H0[1]:.2f} -{H0[1]-H0[0]:.2f}), giving an average standard error of {100*(H0[2]-H0[0])/(2*H0[1]):.2f}%")
Determining the Hubble Constant to ever-increasing accuracy is a huge interest field these days, due to the inconsistencies between values derived by the type 1a cosmic distance ladder, and cosmology CMB fit methods, which puts the value at $H_0 = 66.93\pm 0.62$. Since the distance ladder approach has errors some factors above this, it is a good place to figure out whether this apparent gap is real or not. If it turns out to be, as the evidence is more and more starting to suggest, there are either a fatal flaw in the implementation of one of the approaches, or unknown physics coming into play.
Throughout our derivation of the Hubble Constant, we have made a series of assumptions and carried many error-sources which could all affect the final results. Observational values of Cepheid and 1a supernova photometry and spectroscopy, the standardization technique for type 1a, derivations of redshift intercept, and several other derivations might have underestimated errors. As we have just seen from our own derivations, different techniques can alter the result noticeably. Our own results actually almost put the cosmological Hubble Constant within one sigma. Riess et al. have, however, employed different anchors and techniques, and none seem to correspond very well with the cosmological value.
As the accuracy of the derivations keep increasing, it will be more and more difficult to claim it is caused by some wrong measurement or underestimated error. The most probably candidatates are either a wrongful assumption in one of the approaches, or physics not yet understood.